R version 3.6
Libraries: stringr, MASS, RColorBrewer, ape, ggplot2, ggtree, plyr
Operation systems: Window, Mac. The code for comparison is not tested under Linux based systems.
To perform the simulation, kindly see the User-Guide-tugHall_v_2.0 file. To perform test’s simulation, please, kindly see folder /Results_of_tests/ and description in the file User-Guide-Tests_v2.0. This document is only to explain a pricedure of comparison between cell-based and clone-based codes. To reproduce plots with new data, please, kindly use file REPORT.R of an R script. To make animation of plots, please, kindly use /GIF_in_R/GIF_exmple.R script.
Clone-based code was designed to accelerate calculation and increase number of cell. Advantage of clone-based algorithm is making trial for all cells at 1 clone with one application of trial() function. In cell-based algorithm trial() apples to each cell. But if number of cells equal number of clones, then speed up is 1. That’s why clone-based code works faster for any cases.
Another reason is a case, when we need to simulate huge number of cells like \(10^7\) or \(10^9\), but mutation rate is very low. Cell-based algorithm takes a huge computational cost, and vice verse clone-based algorithm will work very fast, if mutated cells will appear slowly.
\(N_{cells} = N_{cells} - Binom(p,N_{cells})\),
where \(Binom(p,N_{cells})\) is random generation for the binominal distribution, \(N_{cells}\) is a number of cells in a clone. Probability \(p\) is one of probability of death process, for example \(p = a'\) or \(p = k\) etc.
\(N_{cells} = N_{cells} + Binom(d',N_{cells})\)
\(N_{new\_clones} = Binom(m,N_{new\_cells})\),
\(N_{new\_cells} = Binom(d',N_{cells})\).
\(\overline{x} = \frac{\sum_i x_i \cdot N_{cells,i} }{ \sum_i N_{cells,i} }\),
where summation applies for all clones \(i = 1 .. N_{clones}\).
For comparison we used folder /COMPARISON_CLONE_CELL/. To exclude statistical error we repeated simulations for tests 1080 times, using cell-based and clone-based codes. The results of simulations in the folders /RESULTS_CELLS/ and /RESULTS_CLONES/ respectevly.
To compare distributions of simulation data, here we use data of test for environmental death of metastasis cells. Fig. 1 shows a comparison of the distributions of number of the metastasis cells during test simulations, repeated 100 times and acceleration rate clone-based calculation in comparison to cell-based one. We expected acceleration rate around 1000, because of number of cells, but seemingly the averaging process or the memory allocation during parallel calculation takes more computational resources. That’s why acceleration is around 10000 times. other words it is more than number of cells. Data for clone-based code were obtained using personal computer with 1 processor and simulation time around 5-9 minuts. Data for cell-based code were obtained using HPC supercomputer with 9 nodes with 24 cores for each (in total 216 processors), it takes around 4-6 hours.
Data give some discrepancy for 100 trials, but average values and width of distributions are almost same. To make sure that’s due to statistical trials, we repeated simulations 1080 times and compare the accuracy for the clone-based and cell-based codes.
Fig.1. Evolution of number of metastasis cells for 100 trials: left - animation of distributions at different time steps (10,20,30,40,50,60,70,80,90), right - An acceleration rate for different tests: Ap_Pr is apoptosis test for prymary cells, Ap_Met is apoptosis test for metastasis cells, Env_Pr is environment test for prymary cells, Env_Met is environment test for metastasis cells, Met_Tr is metastasis transformation test.
Fig. 2 shows a comparison of the distributions of number of the metastasis cells during test simulations, repeated 1080 times. As we see, the statistical discrepancies for clone-based and cell-based codes are same. And statistical accuracy for 1000 trials is less than for 100 trials (Fig.1).